Back

Human Genomics

Springer Science and Business Media LLC

All preprints, ranked by how well they match Human Genomics's content profile, based on 13 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
Post-Mendelian genetic model in COVID-19

Picchiotti, N.; Benetti, E.; Fallerini, C.; Daga, S.; Baldassarri, M.; Fava, F.; Zguro, K.; Valentino, F.; Doddato, G.; Giliberti, A.; Tita, R.; Amitrano, S.; Bruttini, M.; Di Sarno, L.; Iuso, N.; Alaverdian, D.; Beligni, G.; Croci, S.; Meloni, I.; Pinto, A. M.; Gabbi, C.; Ceri, S.; Esposito, A.; Pinoli, P.; Crawley, F. P.; Frullanti, E.; Mari, F.; GEN-COVID Multicenter Study, ; Gori, M.; Renieri, A.; Furini, S.

2021-01-29 genetic and genomic medicine 10.1101/2021.01.27.21250593
Top 0.1%
134× avg
Show abstract

Host genetics is an emerging theme in COVID-19 and few common polymorphisms and some rare variants have been identified, either by GWAS or candidate gene approach, respectively. However, an organic model is still missing. Here, we propose a new model that takes into account common and rare germline variants applied in a cohort of 1,300 Italian SARS-CoV-2 positive individuals. Ordered logistic regression of clinical WHO grading on sex and age was used to obtain a binary phenotypic classification. Genetic variability from WES was synthesized in several boolean representations differentiated according to allele frequencies and genotype effect. LASSO logistic regression was used for extracting relevant genes. We defined about 100 common driver polymorphisms corresponding to classical "threshold model". Extracted genes were demonstrated to be gender specific. Stochastic rare more penetrant events on about additional 100 extracted genes, when occurred in a medium or severe background (common within the family), simulate Mendelian inheritance in 14% of subjects (having only 1 mutation) or oligogenic inheritance (in 10% having 2 mutations, in 11% having 3 mutations, etc). The combined effect of common and rare results can be described as an integrated polygenic score computed as: (nseverity - nmildness) + F (mseverity - mmildness) where n is the number of common driver genes, m is the number of driver rare variants and F is a factor for appropriately weighing the more powerful rare variants. We called the model "post-Mendelian". The model well describes the cohort, and patients are clustered in severe or mild by the integrated polygenic scores, the F factor being calibrated around 2, with a prediction capacity of 65% in males and 70% in females. In conclusion, this is the first comprehensive model interpreting host genetics in a holistic post-Mendelian manner. Further validations are needed in order to consolidate and refine the model which however holds true in thousands of SARS-CoV-2 Italian subjects.

2
The genetic basis of dermatophytosis skin infection susceptibility

Haapaniemi, H.; Eghtedarian, R.; Tervi, A.; Estonian Biobank Research Team, ; FinnGen, ; Abner, E.; Ollila, H. M.

2024-11-26 dermatology 10.1101/2024.11.25.24317872
Top 0.1%
107× avg
Show abstract

Dermatophytosis is an infection caused by fungi that utilize keratinized tissues, such as skin, nails, and hair, as their energy source. This infection commonly presents as red, itchy and ring-like patches on the skin, nail thickening, or hair loss. With ever-increasing case numbers, it has become a significant public health concern estimated to affect 20 % of the worlds population. Despite the high prevalence, the genetic risk factors for dermatophytosis are poorly understood. Our goal was to elucidate the biological mechanisms underlying individual susceptibility to dermatophytosis and to explore its genetic associations with other diseases and traits. We performed a large-scale genome-wide association meta-analysis of dermatophytosis infections with over 250,000 cases and 1,370,000 controls using data from FinnGen, Estonian Biobank, UK Biobank and Million Veterans Program. We identified 30 genome-wide significant loci including seven missense variants and two variants in high linkage disequilibrium with missense variants. The strongest associations were with variants within or closest to ZNF646 (p = 6.60x10-79, beta = 0.07), HLA-DQB1 (p = 1.42x10-36, beta = 0.05), FLG (p = 1.96x10-27, beta = -0.22), FTO (p = 5.75x10-26, beta = -0.04), SLURP2 (p = 3.33x10-24, beta = 0.04) and KRT77 (p = 1.28x10-15, beta = 0.03) genes. Overall, our findings implicate keratin lifecycle and skin integrity, immune defense, and obesity as risk factors for dermatophytosis. Our findings highlight the clinical comorbidities with other skin diseases and with high BMI and identify novel genetic variants some of which are novel candidates for managing dermatophytosis infection.

3
ACE2 polymorphisms as potential players in COVID-19 outcome

Khayat, A. S.; Assumpcao, P. P. d.; Khayat, B. C. M.; Araujo, T. M. T.; Batista-Gomes, J. A.; Imbiriba, L. C.; Ishak, G.; Assumpcao, P. B. d.; Moreira, F. C.; Burbano, R. R.; Ribeiro-dos-Santos, A. M.; Ribeiro-dos-Santos, A. K.; Santos, N. P. C. d.; Santos, S. E. B. d.

2020-05-29 genetic and genomic medicine 10.1101/2020.05.27.20114843
Top 0.1%
98× avg
Show abstract

The clinical condition COVID-19, caused by SARS-CoV-2, was declared a pandemic by the WHO in March 2020. Currently, there are more than 5 million cases worldwide, and the pandemic has increased exponentially in many countries, with different incidences and death rates among regions/ethnicities and, intriguingly, between sexes. In addition to the many factors that can influence these discrepancies, we suggest a biological aspect, the genetic variation at the viral S protein receptor in human cells, ACE2 (angiotensin I-converting enzyme 2), which may contribute to the worse clinical outcome in males and in some regions worldwide. We performed exomics analysis in native and admixed South American populations, and we also conducted in silico genomics databank investigations in populations from other continents. Interestingly, at least ten polymorphisms in coding, noncoding and regulatory sites were found that can shed light on this issue and offer a plausible biological explanation for these epidemiological differences. In conclusion, ACE2 polymorphisms should influence epidemiological discrepancies observed among ancestry and, moreover, between sexes.

4
Uncharacterized MYH9 Germline Mutations in a Microcystic Adnexal Carcinoma Mimicker: Benign Deep Syringoid Ductal Proliferation (BDSDP) with Elastic Fiber Aggregation

Hanlon, E.; Brown, k. L.; Michel, H. M.; roby, c.; Urbano, M. G.; Mahmutovic, d.; khatiwada, p.; gay, j.; Brown, A. M.; grider, d. j.; Finkielstein, C. v.

2025-08-05 dermatology 10.1101/2025.08.03.25332336
Top 0.1%
97× avg
Show abstract

ImportanceMicrocystic adnexal carcinoma (MAC) is a rare, locally aggressive sweat gland neoplasm sometimes misdiagnosed due to its histologic similarities to benign adnexal proliferations. MYH9-associated elastin aggregation syndrome (MALTA) is an inherited condition characterized by benign MAC-like ductal lesions and by abnormal elastic fiber deposition. ObjectiveTo report previously uncharacterized heterozygous germline mutations in the MYH9 gene in a patient presenting benign deep syringoid ductal proliferations and papillary dermal elastic fiber aggregation. Design, Setting, ParticipantsClinical report with genetic and structural analysis. Dermatology outpatient. A male in their 20s presenting with long-standing, stable erythematous nodules on the right infraorbital region and left zygomatic arch. Genetic testing of first-degree relatives and structural simulations were performed to assess variant impact. Main Outcomes and MeasuresHistological evaluation of the patients lesions revealed benign deep syringoid ductal proliferations with papillary dermal elastic fiber aggregation, distinguishing them from microcystic adnexal carcinoma. Germline genetic testing identified three heterozygous MYH9 variants, two previously uncharacterized, all showing Mendelian segregation in first-degree relatives and associated with structural rearrangement. ResultsHistologic evaluation of the facial lesions revealed keratin-filled microcysts and deep dermal and subcutaneous cords with ductal structures resembling MAC. Immunohistochemistry showed apocrine differentiation (EMA+/CD15+/GCDP+) and basaloid myoepithelial cells positive for p63. No evidence of perineural invasion was observed. Elastic tissue staining showed dense, ball-like aggregates of elastic fibers in the papillary dermis. Germline testing identified c.1363G>A (p.Gly455Ser) in the myosin head domain, and c.4490G>A (p.Arg1497Gln) and c.4876A>G (p.Ile1626Val) in the tail domain of Myosin-9. Saliva-based testing confirmed Mendelian segregation in multiple first-degree relatives. Missense mutations were predicted to alter the coiled-coil structure, potentially disrupting chain interactions and affecting the motifs parallel versus antiparallel orientation. Conclusions and RelevanceThis case broadens the phenotypic and genotypic spectrum of MALTA syndrome and introduces the diagnostic term: benign deep syringoid ductal proliferation (BDSDP) with elastic fiber aggregation. The findings underscore the diagnostic challenges in distinguishing BDSDP from MAC and highlight the critical role of integrating histopathologic, immunohistochemical, and genetic data in accurate diagnosis. These results support the need for further investigation into MYH9-associated adnexal neoplasia and its underlying molecular mechanisms. Key PointsO_ST_ABSQuestionC_ST_ABSHow do germline MYH9 variants contribute to the pathogenesis of benign deep syringoid ductal proliferations with elastic fiber aggregation, a phenotype that clinically and histologically mimics microcystic adnexal carcinoma? FindingsGenetic analysis revealed two previously unreported heterozygous variants in the MYH9 gene: c.1363G>A (p.Gly455Ser), located in the myosin head domain, a region previously associated with MALTA syndrome, and a variant in the myosin tail domain, c.4490G>A (p.Arg1497Gln). A third mutation, c.4876A>G (p.Ile1626Val), was also detected. All three variants demonstrated Mendelian segregation from the parents, were identified in multiple family members, and were predicted to cause structural perturbations. MeaningThese findings provide strong evidence for a heritable contribution of these mutations to the observed phenotype. The presence of these MYH9 variants highlights a potential functional impact on protein structure and activity. This pattern supports the hypothesis that MYH9 mutations may underlie or modify the pathogenesis of benign syringoid ductal proliferations, expanding the known spectrum of MYH9-associated conditions and offering a molecular basis for improved diagnosis and familial risk assessment.

5
New RDEB intermediate variant with in-frame partial exon skipping in FN-III-like domain of type VII collagen.

Evtushenko, N.; Kubanov, A.; Martynova, A.; Kondratyev, N.; Beilin, A.; Karamova, A.; Monchakovskaya, E.; Azimov, K.; Nefedova, M.; Bozhanova, N.; Zaklyazminskaya, E.; Gurskaya, N.

2022-09-04 dermatology 10.1101/2022.09.02.22278356
Top 0.1%
96× avg
Show abstract

Recessive Dystrophic Epidermolysis Bullosa (RDEB) is a debilitating genodermatosis caused by pathogenic mutations in the COL7A1 gene, which induce absence or reduction in the number of anchoring fibrils. The severity of RDEB depends on the mutation type and localization, but many aspects of this dependence remain to be elucidated. Here, we report a novel variant of RDEB Intermediate in two unrelated patients. Their disease manifestation includes early skin and oral mucosa blistering and is associated with localized atrophic scarring. According to the exome and Sanger sequencing results, both investigated Probands are the carriers of complex heterozygosity in the COL7A1 gene with the same deletion in intron 19 of the COL7A1 gene. RT-PCR followed by sequence analysis revealed skipping of the part of exon19, as well as the rescue of the open reading frame (ORF) of COL7A1 in both Probands. We hypothesize that the mutation in the acceptor splice site leads to the activation of the cryptic donor splice site, resulting in the truncated but partially functional protein and the milder phenotype of intermediate RDEB. This rare type of mutation expands our understanding of RDEB etiology and invites further investigation.

6
InpherNet provides attractive monogenic disease gene hypotheses using patient genes indirect neighbors

Yoo, B.; Birgmeier, J.; Bernstein, J. A.; Bejerano, G.

2020-07-11 genetic and genomic medicine 10.1101/2020.07.10.20150425
Top 0.1%
95× avg
Show abstract

Close to 70% of patients suspected to have a Mendelian disease remain undiagnosed after genome sequencing, partly because our current knowledge about disease-causing genes is incomplete. Although hundreds of new diseases-causing genes are discovered every year, the discovery rate has been constant for over a decade. Generating an attractive novel disease gene hypothesis from patient data can be time-consuming as each patients genome can contain dozens to hundreds of rare, possibly pathogenic variants. To generate the most plausible hypothesis, many sources of indirect evidence about each candidate variant may be considered. We introduce InpherNet, a network-based machine learning approach to accelerate this process. InpherNet ranks candidate genes based on gene neighbors from 4 graphs, of orthologs, paralogs, functional pathway members, and co-localized interaction partners. As such InpherNet can be used to both prioritize potentially novel disease genes and also help reveal known disease genes where their direct annotation is missing, or partial. InpherNet is applied to over 100 patient cases for whom the causative gene is incorrectly given low priority by two clinical gene ranking methods that rely exclusively on human patient-derived evidence. It correctly ranks the causative gene among its top 5 candidates in 68% of the cases, compared to 9-44% using comparable tools including Phevor, Phive and hiPhive.

7
No evidence for allelic association between Covid-19 and ACE2 genetic variants by direct exome sequencing in 99 SARS-CoV-2 positive patients

Novelli, A.; Biancolella, M.; Borgiani, P.; Cocciadiferro, D.; Colona, V. L.; D'Apice, M. R.; Rogliani, P.; Zaffina, S.; Leonardis, F.; Campana, A.; Raponi, M.; Andreoni, M.; Grelli, S.; Novelli, G.

2020-05-26 genetic and genomic medicine 10.1101/2020.05.23.20111310
Top 0.1%
94× avg
Show abstract

BackgroundCoronaviruses (CoV) are a large family of viruses that are common in people and many animal species. Animal coronaviruses rarely infect humans with the exceptions of the Middle East Respiratory Syndrome (MERS-CoV), the Severe acute respiratory syndrome coronavirus (SARS-CoV), and now SARS-CoV-2, which is the cause of the ongoing pandemic of coronavirus disease 2019 (COVID-19). Many studies suggested that genetic variants in ACE2 gene may influence the host susceptibility/resistance to SARS-CoV-2 virus according to the functional role of ACE2 in human pathophysiology. However, all these studies have been conducted in silico based on epidemiological and population data. We therefore investigated the occurrence of ACE2 variants in a cohort of 99 Italian unrelated individuals clinically diagnosed with coronavirus disease 19 (COVID-19) to experimental demonstrate allelic association with disease severity. MethodsBy whole-exome sequencing we analysed 99 DNA samples of severely and extremely severely COVID-19 patients hospitalized at the University Hospital of Rome "Tor Vergata" and Bambino Gesu Hospital in Rome. ResultsWe identified three different germline variants, one intronic (c.439+4G>A) and two missense (c.2158A>G, p.Asn720Asp; c.1888G>C, p.Asp630His), in 26 patients with a similar frequency between male and female and a not statistically different frequency, except for c.1888G>C, (p.Asp630His) with the ethnically matched populations (EUR). ConclusionsOur results suggest that there is not any ACE2 exonic allelic association with disease severity. It is possible that rare susceptibility alleles are located in the non-coding region of the gene able to control ACE2 gene activity. It is therefore of interest, to explore the existence of ACE2 susceptibility alleles to SARS-Co-V2 in these regulatory regions. In addition, we found no significant evidence that ACE2 alleles is associated with disease severity/sex bias in the Italian population.

8
The predictive capacity of polygenic risk scores for disease risk is only moderately influenced by imputation panels tailored to the target population

Levi, H.; Elkon, R.; Shamir, R.

2023-08-29 genetic and genomic medicine 10.1101/2023.08.29.23294769
Top 0.1%
90× avg
Show abstract

Polygenic risk scores (PRS) predict individuals genetic risk of developing complex diseases. They summarize the effect of many genetic variants discovered in genome-wide association studies (GWASs). However, to date, large GWASs exist primarily for the European population and the quality of PRS prediction declines when applied to target sets of other ethnicities. A key step in using a PRS is imputation, which is the inference of un-typed SNPs using a set of fully-sequenced individuals, called the imputation panel. The SNP genotypes called by the imputation process depend on the ethnic composition of the imputation panel. Several studies have shown that imputing genotypes using a panel that contains individuals of the same ethnicity as the genotyped individuals improves imputation accuracy. However, until now, there has been no systematic investigation into the influence of the ethnic composition of imputation panels on the accuracy of PRS predictions when applied to ethnic groups that differ from the population used in the GWAS. In this study we estimated the effect of imputation of the target set on prediction accuracy of PRS when the discovery (GWAS) and the target sets come from different ethnic groups. We analyzed twelve binary phenotypes and three populations from the UK Biobank (Europeans, South-Asians, and Africans). We generated imputation panels from several ethnic groups, imputed the target set using each panel, and generated PRS to compute individuals risk scores. Then, we assessed the prediction accuracy obtained from each imputation panel. Our analysis indicates that using an imputation panel matched to the ethnic group of the target population yields only a marginal improvement and only under specific conditions. Hence, while a target-matched imputation panel can potentially improve prediction accuracy of European PRSs in non-EUR populations, the improvement is limited.

9
PATHOS: Predicting Variant Pathogenicity by Combining Protein Language Models and Biological Features

Radjasandirane, R.; Cretin, G.; Diharce, J.; de Brevern, A. G.; Gelly, J.-C.

2025-12-27 genetic and genomic medicine 10.64898/2025.12.22.25342839
Top 0.1%
82× avg
Show abstract

Predicting the pathogenic impact of missense variants is essential for understanding and diagnosing genetic diseases. These approaches have undergone significant evolution, with the latest methodologies based on deep learning approaches. Nonetheless, only a limited number use the potential of Protein Language Models (PLMs), which have demonstrated strong performance across various protein-related tasks. A new predictor, called PATHOS, was developed; it combines embeddings from an optimal set of two PLMs, namely ESM C 600M and Ankh 2 Large. Their embeddings were combined with additional crucial biological features such as phylogenetic probabilities, allele frequency, and protein annotations; they were aggregated using a fully connected layer architecture. Compared to 65 other predictors on clinical data, PATHOS outperforms state-of-the-art performance. It achieves a Matthews Correlation Coefficient (MCC) of 0.591 on a manually and carefully curated clinical dataset and 0.826 on a ClinVar dataset, surpassing other leading tools. Furthermore, case studies on the progesterone receptor and the KCNQ1 ion channel illustrate that PATHOS can identify functionally critical regions and known pathogenic mutations missed by other leading predictors like AlphaMissense. To ensure broad accessibility and facilitate use by non-specialists, a user-friendly web server containing a database of 140 millions precomputed predictions from human protein from Swiss-Prot was provided. The web server is available at: https://dsimb.inserm.fr/PATHOS/

10
Re-evaluation of whole exome sequencing, including intronic region, in combination with genetic intolerance score for foetal structural anomalies, is helpful for diagnosis, especially in X-linked disorders

Taniguchi, K.; Hasegawa, F.; Okazaki, Y.; Hori, A.; Ogata-Kawata, H.; Aoto, S.; Migita, O.; Kawai, T.; Nakabayashi, K.; Okamura, K.; Fukui, K.; Wada, S.; Ozawa, K.; Ito, Y.; Sago, H.; Hata, K.

2023-02-07 genetic and genomic medicine 10.1101/2023.02.05.23285039
Top 0.1%
80× avg
Show abstract

BackgroundWhole-exome sequencing (WES) is a strong diagnostic tool for foetal structural anomalies, but the causative gene for more than half the anomalies have not been identified. Therefore, improving the diagnostic yield based on WES data is essential. MethodsFirst, 138 foetuses with structural anomalies were assessed using conventional WES and copy number variation (CNV) analyses. For undiagnosed cases, we employed a three-step approach for diagnosis. We re-evaluated 1) candidate variants using a loss-of-function observed/expected upper bound fraction (LOEUF) score, 2) all variants of disease-causing genes for clinically diagnosed cases using spliceAI, and 3) the rare variants in all low LOEUF scored genes (< 0.35) using spliceAI. ResultsWe identified molecular diagnoses in 53 of 138 cases (38.4%) using conventional WES and CNV. For undiagnosed cases, for the first step, we diagnosed two X-linked recessive diseases. For the second step, we diagnosed Meckel-Gruber syndrome by detecting likely pathogenic intron variant in TMEM67. In the third step, we identified a de novo hemizygous pathogenic variant in one severe hydrops fetalis male, which caused aberrant splicing in CASK. We found a novel phenotype, hydrops fetalis, in CASK-related X-linked dominant disorder. Moreover, we revealed that the LOEUF score of X-linked disease-causing genes was significantly lower than that of autosomal genes among all OMIM-registered genes. ConclusionWe showed that the evaluation of variants, including introns of WES data, in combination with the LOEUF score, could improve the WES diagnostic yield and be useful for evaluation of variants, especially on chromosome X. What is already known on this topic?Molecular genetic diagnosis of foetal structural anomalies using WES is being increasingly implemented. However, more than half of the cases cannot be diagnosed. There is signigicant potential to increase the diagnostic yield by re-analysing WES data. What this study addsIn the present study, we focused on loss-of-function observed/expected upper bound fraction (LOEUF) scores to quantify genetic intolerance, and additional intron analysis for undiagnosed cases using conventional WES data. These approaches enabled the appropriate evaluation of candidate variants and detected overlooked candidate variants on intron. We diagnosed two X-linked recessive disorder cases (Hardikar syndrome and Ritscher-Schinzel syndrome), by re-evaluating candidate variants using the LOEUF score. We also diagnosed one Meckel-Gruber syndrome case caused by an intronic pathogenic variant, that had been overlooked by the conventional method. Moreover, evaluating all variants, including introns with low LOEUF score genes (2,971 genes) that could cause haploinsufficiency helped us find a pathogenic intronic variant on CASK in one hydrops fetalis case, which revealed that CASK-related X-linked dominant disorder could cause hydrops fetalis as severe phenotypes. Finally, the LOEUF score of X-linked genes was significantly lower than that of autosomal genes among all OMIM-registered genes, which meant that gene evaluation using the LOEUF score was helpful, for genetic diagnosis, especially for genes on chromosome X. How this study might affect research, practice, or policyThe evaluation of variants, including introns, in combination with the LOEUF score is expected to contribute to the improvement of the diagnostic yield in WES. These approaches are easy and convenient to implement. The LOEUF score might be useful for evaluation of variants, especially in chromosome X.

11
A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction

Bohn, E.; Lau, T.; Wagih, O.; Masud, T.; Merico, D.

2023-07-12 genetic and genomic medicine 10.1101/2023.07.10.23292474
Top 0.1%
68× avg
Show abstract

Variants in 5 and 3 untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high-confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effect. 3 and 5 UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. 295 3 and 188 5 UTR variants were obtained from ClinVar, of which 26 3 and 68 5 UTR variants were classified as P/LP. Predictions by DL models achieved statistically-significant differences when comparing model-matched P/LP variants to both putative benign variants and model-mismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3 and 5 UTR. In conclusion, we present a high-confidence set of P/LP 3 and 5 UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.

12
Using computational approaches to enhance the interpretation of missense variants in the PAX6 gene

Andhika, N. S.; Biswas, S.; Hardcastle, C.; Green, D.; Ramsden, S.; Birney, E. J.; Black, G. C.; Sergouniotis, P.

2023-12-26 genetic and genomic medicine 10.1101/2023.12.21.23300370
Top 0.1%
66× avg
Show abstract

PurposeThe PAX6 gene encodes a highly-conserved transcription factor involved in eye development. Heterozygous loss-of-function variants in PAX6 can cause a range of ophthalmic disorders including aniridia. A key molecular diagnostic challenge is that many PAX6 missense changes are presently classified as variants of uncertain significance. While computational tools can be used to assess the effect of genetic alterations, the accuracy of their predictions varies. Here, we evaluated and optimised the performance of computational prediction tools in relation to PAX6 missense variants. MethodsThrough inspection of publicly available resources (including HGMD, ClinVar, LOVD and gnomAD), we identified 241 PAX6 missense variants that were used for model training and evaluation. The performance of ten commonly-used computational tools was assessed and a threshold optimization approach was utilized to determine optimal cut-off values. Validation studies were subsequently undertaken using PAX6 variants from a local database. ResultsAlphaMissense, SIFT4G and REVEL emerged as the best-performing predictors; the optimized thresholds of these tools were 0.967, 0.025, and 0.772, respectively. Combining the prediction from these top-three tools resulted in lower performance compared to using AlphaMissense alone. ConclusionTailoring the use of computational tools by employing optimized thresholds specific to PAX6 can enhance algorithmic performance. Our findings have implications for PAX6 variant interpretation in clinical settings.

13
Evaluation of a genetic risk score for severity of COVID-19 using human chromosomal-scale length variation.

Toh, C.; Brody, J. P.

2020-07-07 genetic and genomic medicine 10.1101/2020.07.06.20147637
Top 0.1%
66× avg
Show abstract

IntroductionThe course of COVID-19 varies from asymptomatic to severe (acute respiratory distress, cytokine storms, and death) in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is responsible for the highly variable response to infection. We evaluated how well a genetic risk score based on chromosome-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection. MethodsWe compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-COV-2 infection before 27 April 2020 to a similar number of age matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosome-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to Covid-19 based only on their 88-number classification. ResultsWe found that the XGBoost classifier could differentiate between the two classes at a significant level p = 2 {middle dot} 10 as measured against a randomized control and p = 3 {middle dot} 10 measured against the expected value of a random guessing algorithm (AUC=0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test. Conclusion

14
Associations between polygenic risk score and COVID-19 severity in Russian population using low-pass genome sequencing

Nostaeva, A.; Shimansky, V.; Apalko, S.; Kuznetsov, I.; Sushentseva, N.; Popov, O.; Aulchenko, Y.; Shcherbak, S.

2023-11-20 genetic and genomic medicine 10.1101/2023.11.20.23298335
Top 0.1%
65× avg
Show abstract

The course of COVID-19 is characterized by wide variability, with genetics playing a contributing role. Through large-scale genetic association studies, a significant link between genetic variants and disease severity was established. However, individual genetic variants identified thus far have shown modest effects, indicating a polygenic nature of this trait. To address this, a polygenic risk score (PRS) can be employed to aggregate the effects of multiple single nucleotide polymorphisms (SNPs) into a single number, allowing practical application to individuals within a population. In this work, we investigated the performance of a PRS model in the context of COVID-19 severity in 1,085 Russian participants using low-coverage NGS sequencing. By developing a genome-wide PRS model based on summary statistics from the COVID-19 Host Genetics Initiative consortium, we demonstrated that the PRS, which incorporates information from over a million common genetic variants, can effectively identify individuals at significantly higher risk for severe COVID-19. The findings revealed that individuals in the top 10% of the PRS distribution had a markedly elevated risk of severe COVID-19, with an odds ratio (OR) of 2.1 (95% confidence interval (CI): 1.4-3.2, p-value = 0.00046). Furthermore, incorporating the PRS into the prediction model significantly improved its accuracy compared to a model that solely relied on demographic information (p-value < 0.0001). This study highlights the potential of PRS as a valuable tool for identifying individuals at increased risk of severe COVID-19 based on their genetic profile.

15
Mutation Pathogenicity Prediction by a Biology Based Explainable AI Multi-Modal Algorithm

Kellerman, R.; Nayshool, O.; Barel, O.; Paz, S.; Amariglio, N.; Klang, E.; Rechavi, G.

2024-06-05 genetic and genomic medicine 10.1101/2024.06.05.24308476
Top 0.1%
63× avg
Show abstract

Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Deciphering the protein structure therefore provides great insight into the molecular mechanisms underlying biological functions in human disease. While there have recently been major advances in the artificial intelligence-based prediction of protein structure, the determination of the biological and clinical relevance of specific mutations is not yet up to clinical standards. This challenge is of utmost medical importance when decisions, as critical as suggesting termination of pregnancy or recommending cancer-directed rational drugs, depend on the accuracy of prediction of the effect of the specific mutation. Currently, available tools are aiming to characterize the effect of a mutation on the functionality of the protein according to biochemical criteria, independent of the biological context. A specific change in protein structure can result either in loss of function (LOF) or gain-of-function (GOF) and the ability to identify the directionality of effect needs to be taken into consideration when interpreting the biological outcome of the mutation. Here we describe Triple-modalities Variant Interpretation and Analysis (TriVIAI), a tool incorporating three complementing modalities for improved prediction of missense mutations pathogenicity: protein language model (pLM), graph neural network (GNN) and a tabular model incorporating physical properties from the protein structure. The TriVIAl ensembles predictions compare favorably with the existing tools across various metrics, achieving an AUC-ROC of 0.887, a precision-recall curve (PRC) score of 0.68, and a Brier score of 0.16. The TriVIAI ensemble is also endowed with two major advantages compared to other available tools. The first is the incorporation of biological insights which allow to differentiate between GOF mutations that tend to cluster in specific hotspots and affect structure in a specific functional way versus LOF mutations that are usually dispersed and can cripple the protein in a variety of different ways. Importantly, the advantage over other available tools is more noticeable with GOF mutations as their effect on the protein structure is less disruptive and can be misinterpreted by current variant prioritization strategies. Until now available AI-based pathogenicity predicting algorithms were a black box for the users. The second significant advantage of TriVIAI is the explainability of the ensemble which contrasts the other available AI-based pathogenicity predicting algorithms which constitute a black box for the users. This explainability feature is of major importance considering the clinical responsibility of the medical decision-makers using AI-based pathogenicity predictors.

16
Harnessing the 100,000 Genomes Project whole genome sequencing data - an unbiased systematic tool to filter by biologically validated regions of functionality

Xiao, S.; Kai, Z.; Brown, D.; Genomics England Research Consortium, ; Shovlin, C. L.

2020-04-02 genetic and genomic medicine 10.1101/2020.03.30.20047209
Top 0.1%
63× avg
Show abstract

Whole genome sequencing (WGS) is championed by the UK National Health Service (NHS) to identify genetic variants that cause particular diseases. The full potential of WGS has yet to be realised as early data analytic steps prioritise protein-coding genes, and effectively ignore the less well annotated non-coding genome which is rich in transcribed and critical regulatory regions. To address, we developed a filter, which we call GROFFFY, and validated in WGS data from hereditary haemorrhagic telangiectasia patients within the 100,000 Genomes Project. Before filter application, the mean number of DNA variants compared to human reference sequence GRCh38 was 4,867,167 (range 4,786,039-5,070,340), and one-third lay within intergenic areas. GROFFFY removed a mean of 2,812,015 variants per DNA. In combination with allele frequency and other filters, GROFFFY enabled a 99.56% reduction in variant number. The proportion of intergenic variants was maintained, and no pathogenic variants in disease genes were lost. We conclude that the filter applied to NHS diagnostic samples in the 100,000 Genomes pipeline offers an efficient method to prioritise intergenic, intronic and coding gDNA variants. Reducing the overwhelming number of variants while retaining functional genome variation of importance to patients, enhances the near-term value of WGS in clinical diagnostics.

17
Melanocortin-1 receptor (MC1R) genotypes do not correlate with size in two cohorts of medium-to-giant congenital melanocytic nevi

Calbet-Llopart, N.; Pascini-Garrigos, M.; Tell-Marti, G.; Potrony, M.; Martins da Silva, V.; Barreiro, A.; Puig, S.; Captier, G.; James, I.; Degardin, N.; Carrera, C.; Malvehy, J.; Etchevers, H. C.; Puig-Butille, J. A.

2020-04-14 dermatology 10.1101/2020.04.10.20055301
Top 0.1%
63× avg
Show abstract

Congenital melanocytic nevi (CMN) are cutaneous malformations whose prevalence is inversely correlated with projected adult size. CMN are caused by somatic mutations, but epidemiological studies suggest that germline genetic factors may influence CMN development. In CMN patients from the U.K., genetic variants in the MC1R gene, such as p.V92M and loss-of-function variants, have been previously associated with larger CMN. We analyzed the association of MC1R variants with CMN characteristics in 113 medium-to-giant CMN patients from Spain and from a distinct cohort of 53 patients from France, Norway, Canada and the U.S. These cohorts were similar at the clinical and phenotypical level, except for the number of nevi per patient. We found that the p.V92M or loss-of-function MC1R variants either alone or in combination did not correlate with CMN size, in contrast to the U.K. CMN patients. An additional case-control analysis with 259 unaffected Spanish individuals, showed a higher frequency of MC1R compound heterozygous or homozygous variant genotypes in Spanish CMN patients compared to the control population (15.9% vs. 9.3%; P=0.075). Altogether, this study suggests that MC1R variants are not associated with CMN size in these non-U.K. cohorts. Additional studies are required to define the potential role of MC1R as a risk factor in CMN development. SIGNIFICANCECongenital melanocytic nevi (CMN) are common pigmented lesions that originate during prenatal life, without clear evidence of a genetic predisposition. To date, limited data exist regarding the role of the MC1R gene, a key regulator of human pigmentation, in the development of the class of rarer CMN that are greater than 10 cm diameter at projected adult size and associated with increased morbidity or mortality risks. This study provides data from a large set of such CMN patients to support the hypothesis that MC1R could be involved in the development of these types of lesions, but at the same time discounting its influence on the size of CMN across distinct populations. Improving our understanding of genetic susceptibility to rare types of CMN is necessary to determine whether routine germline genotyping is relevant in clinical practice.

18
A postzygotic GNA13 variant upregulates the RHOA/ROCK pathway and alters melanocyte function in a mosaic skin hypopigmentation syndrome

El Masri, R.; Iannuzzo, A.; Kuentz, P.; Tacine, R.; Vincent, M.; Barbarot, S.; Morice-Picard, F.; Boralevi, F.; Oillarburu, N.; Mazereeuw-Hautier, J.; Duffourd, Y.; Faivre, L.; Sorlin, A.; Vabres, P.; Delon, J.

2024-07-24 dermatology 10.1101/2024.07.24.24310661
Top 0.1%
62× avg
Show abstract

The genetic bases of mosaic pigmentation disorders have increasingly been identified, but these conditions remain poorly characterised, and their pathophysiology is unclear. Here, we report in four unrelated patients that a recurrent postzygotic mutation in GNA13 is responsible for a recognizable syndrome with hypomelanosis of Ito associated with developmental anomalies. GNA13 encodes G13, a subunit of {beta}{gamma} heterotrimeric G proteins coupled to specific transmembrane receptors known as G-protein coupled receptors. In-depth functional investigations revealed that this R200K mutation provides a gain of function to G13. Mechanistically, we show that this variant hyperactivates the RHOA/ROCK signalling pathway that consequently increases actin polymerisation and myosin light chains phosphorylation, and promotes melanocytes rounding. Our results also indicate that R200K G13 hyperactivates the YAP signalling pathway. All these changes appear to affect cell migration and adhesion but not the proliferation. Our results suggest that hypopigmentation can result from a defect in melanosome transfer to keratinocytes due to cell shape alterations. These findings highlight the interaction between heterotrimeric G proteins and the RHOA pathway, and their role in melanocyte function.

19
Clinical population genetic analysis of variants in the SARS-CoV-2 receptor ACE2

Ardeshirdavani, A.; Zakeri, P.; Mehrtash, A.; Hosseini, S. M.; Li, G.; Mirtavoos-Mahyari, H.; Soltanpour, M. j.; Tavallaie, M.; Moreau, Y.

2020-05-29 genetic and genomic medicine 10.1101/2020.05.27.20115071
Top 0.1%
62× avg
Show abstract

PurposeSARS-CoV-2 infects cells via the human Angiotensin-converting enzyme 2 (ACE2) protein. The genetic variation of ACE2 function and expression across populations is still poorly understood. This study aims at better understanding the genetic basis of COVID-19 outcomes by studying association between genetic variation in ACE2 and disease severity in the Iranian population. MethodsWe analyzed two large Iranian cohorts and several publicly available human population variant databases to identify novel and previously known ACE2 exonic variants present in the Iranian population and considered those as candidate variants for association between genetic variation and disease severity. We genotyped these variants across three groups of COVID-19 patients with different clinical outcomes (mild disease, severe disease, and death) and evaluated this genetic variation with regard to clinical outcomes. ResultsWe identified 32 exonic variants present in Iranian cohorts or other public variant databases. Among those, 11 variants are novel and have thus not been described in other populations previously. Following genotyping of these 32 candidate variants, only the synonymous polymorphism (c.2247G>A) was detected across the three groups of COVID-19 patients. ConclusionGenetic variability of known and novel exonic variants was low among our COVID-19 patients. Our results do not provide support for the hypothesis that exonic variation in ACE2 has a sizeable impact on COVID-19 severity across the Iranian population.

20
Minimizing biological risk for novel inhibitory drug targets: One knockout is all you need

Dimitriev, A.; Postovit, L.-M.; Simpson, A. L.; Wong, G. K.-S.

2024-06-20 genetic and genomic medicine 10.1101/2024.06.19.24309116
Top 0.1%
62× avg
Show abstract

We argue that biological risk for novel inhibitory drug targets can be minimized, almost eliminated, by a computational analysis of the healthcare records and DNA sequences in resources like UK Biobank or All-of-Us. The key insight is that an inhibitory drug is functionally equivalent to a loss-of-function (LOF) variant in the targeted gene. It is a special case of what has been called an "experiment of nature". To demonstrate, we considered all available clinical trials (58 in total) and inhibitory drugs (15 in total) for 5 cardiovascular drug targets: PCSK9, APOC3, ANGPTL3, LPA, and ASGR1. The results were shocking. Every biomarker assessed in these clinical trials was successfully predicted, i.e. directionality and proportionality of effect, but not the magnitude since that varies with dosage. This concept has not been widely adopted because geneticists believe that homozygous LOFs, which are exceedingly rare, would be needed to observe a significant phenotypic effect from most genetic knockouts. Our study shows that, to the contrary, given a sufficiently large biobank, counting both carriers and non-carriers, heterozygous LOFs alone can inform drug development.